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Abstract. We model messaging activities as a hierarchical doubly stochastic point process with 
three main levels, and develop an iterative algorithm for inferring actors' relative latent positions from 
a stream of messaging activity data. Each of the message-exchanging actors is modeled as a process 
in a latent space. The actors' latent positions are assumed to be influenced by the distribution of a 
much larger population over the latent space. Each actor's movement in the latent space is modeled 
as being governed by two parameters that we call confidence and visibility, in addition to dependence 
on the population distribution. The messaging frequency between a pair of actors is assumed to be 
inversely proportional to the distance between their latent positions. Our inference algorithm is 
^v^j based on a projection approach to an online filtering problem. The algorithm associates each actor 

with a probability density-valued process, and each probability density is assumed to be a mixture 
of basis functions. For efficient numerical experiments, we further develop our algorithm for the case 
where the basis functions are obtained by translating and scaling a standard Gaussian density. 
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1. Introduction. Communication networks are presenting ever- increasing chal- 
lenges in a wide range of applications, and there is great interest in inferential methods 
for exploiting the information they contain. A common source of such data is a cor- 
pus of time-stamped messages such as e-mails or SMS (short message service). Such 
messaging data is often useful for inferring a social structure of the community that 
generates the data. In particular, messaging data is an asset to anyone who would 
ly-^ like to cluster actors according to their similarity. A practitioner is often privy to 

messaging data in a streaming fashion, where the word streaming describes a prac- 
tical limitation, as the practitioner might be privy only to the incoming data in a 
fixed summarized form without any possibility to retrieve past information. It is in 
the practitioner's interest to transform the summarized data so that the transformed 
data is appropriate for detecting emerging social trends in the source community. 
We mathematically model such streaming data as a collection of tuples of the 



m 

form V = {(te,i(,je)} of time and actors, where ig and je € {l,...,n} represent 
actors exchanging the £-th message and t e € R + represents the occurrence time of 
the l-th message. There are many models suitable for dealing with such data. The 
. £h most notable are the Cox hazard model, the doubly stochastic process (also known 

^ as the Cox process) , and the self-exciting process (although self-exciting processes are 

sometimes considered as special cases of the Cox hazard model). For references on 
these topics, see Andersen et al. [1995], Snyder [1975] and Bremaud [1981]. All three 
models are related to each other; however, the distinctions are crucial to statistical 
inference as they stem from different assumptions on information available for (online) 
inference. To transform T> data to a data representation more suitable for clustering 
actors, we model I? as a (multivariate) doubly stochastic process, and develop a 
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method for embedding I? as a stochastic process taking values in R d for some suitably 
chosen d e N. 

2. Related works. For statistical inference when there is information available 
beyond T>, the Cox-proportional hazard model is a natural choice. In Heard et al. 
[2010] and Perry and Wolfe [2013], for instance, instantaneous intensity of messaging 
activities between each pair of actors is assumed to be a function of, in the language 
of generalized linear model theory, known covariates with unknown regression param- 
eters. More specifically, in Heard et al. [2010], the authors consider a model where 
Xij(t) — Aij(t)(Bij(t) + 1) with Aij(t) and Bij(t) representing independent counting 
processes, e.g., A^(i) are Bernoulli random variables and B^it) are random variables 
from the exponential family. On the other hand, in Perry and Wolfe [2013], a Cox 
multiplicative model was considered where \ij(t) = £i(t) exp{^QXij(t)}. The model 
in Perry and Wolfe [2013] posits that actor i interacts with actor j at a baseline rate & 
modulated by the pair's covariate Xij whose value at time t is known and /3q is a com- 
mon parameter for all pairs. In Perry and Wolfe [2013], it is shown under some mild 
conditions that one can estimate the global parameter f3 consistently In Stomakhin 
et al. [2011], the intensity is modeled for adversarial interaction between macro level 
groups, and a problem of nominating unknown participants in an event as a missing 
data problem is entertained using a self-exciting point process model. In particular, 
while no explicit intensity between a pair of actors (gang members) is modeled, the 
event intensity between a pair of groups (gangs) is modeled, and the spatio-temporal 
model's chosen intensity process is self-exciting in the sense that each event can affect 
the intensity process. 

When data T> is the only information at hand, a common approach is to con- 
struct a time series of (multi-)graphs to model association among actors. For such 
an approach, a simple method to obtain a time series of graphs from T> is to "pair- 
wise threshold" along a sequence of non-overlapping time intervals. That is, given 
an interval, for each pair of actors i and j, an edge between vertex i and vertex j is 
formed if the number of messaging events between them during the interval exceeds a 
certain threshold. This is the approach taken in Cortes et al. [2003], Eckmann et al. 
[2004], Adamic and Adar [2005], Lee and Maggioni [2011] and Ranola et al. [2010], to 
mention just a few examples. The resulting graph representation is often thought to 
capture the structure of some underlying social dynamics. However, recent empirical 
research, e.g., Choudhury et al. [2010], has begun to challenge this approach by noting 
that changing the thresholding parameter can produce dramatically different graphs. 

Another useful approach when V is the only information available is to use a 
doubly stochastic process model in which count processes are driven by latent factor 
processes. This is the approach taken explicitly in Lee and Priebe [2011] and Tang 
et al. [2013], and this is also done implicitly in Chi and Kolda [2012]. In Lee and 
Priebe [2011] and Tang et al. [2013] interactions between actors are specified by prox- 
imity in their latent positions; the closer two actors are to each other in their latent 
configuration, the more likely they exchange messages. Using our model, we consider 
a problem of clustering actors "online" by studying their messaging activities. This 
allows us a more geometric approach afforded by embedding 0(n 2 ) data to an 0{n x d) 
representation for some fixed dimension d. 

In this paper, we propose a useful mathematical formulation of the problem as 
a filtering problem based on both a multivariate point process observation and a 
population latent position distribution. 
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3. Notation. As a convention, we assume that a vector is a column vector 
if its dimension needs to be disambiguated. We denote by T t the filtration up to 
time t that models the information generated by undirected communication activities 
between actors in the community, where "undirected" here means we do not know 
which actor is the sender and which is the receiver. We denote by Mi(R d ) the space 
of probability measures on R d . For a probability density function defined on R d , 
4>(x; c, s) denotes the probability density function that is proportional to <f)(s~ 1 (x — c)) 
where the normalizing constant does not depend on x. The set of all r x c matrices 
over the reals is denoted by M riC . For each fci x fc 2 matrix M € M / t 1 j C2 , we write 

\\ m \\f = (Er=i Ecli M rc) 1/2 - Given a vector v € R d , we write for its Euclidean 
norm. Let R+ = (0, oo) and M£ fc2 := {M e M klM : M r . c > 0}. For each M x 
and Mi G Mfc li fc 2 , we write M\ * M 2 for the Hadamard product of M\ and M 2 , i.e., 
* denotes component-wise multiplication. Given vectors v\,. . . ,v n in R d , the Gram 
matrix of the ordered collection v — (v\, . . . , v n ) is the d x d matrix G such that its 
(r, c)- entry GV, C is the inner product (v r ,v c ) — vjv c of v r and v c . Given a matrix 
M £ M^d, diag(M) is the column vector whose fc-th entry is the fc-th diagonal clement 
of M. With a slight abuse of notation, given a vector v <G we will also denote by 
diag(w) the d x d diagonal matrix such that its fc-th diagonal entry is v k - We always 
use n for the number of actors under observation and d for the dimension of the latent 
space. We denote by X the n-fold product R d x • • • x R d of R d . An element of X will be 
written in bold face letters, e.g. x e X. Similarly, bold faced letters will typically be 
used to denote objects associated with the n actors collectively. An exception to this 
convention is the identity matrix which is denoted by Id = I, where the dimension 
is specified only if needed for clarification. Also, we write 1 as the column matrix of 
ones. With a bit of abuse of notation, we also write 1 for an indicator function, and 
when confusion is possible, we will make our meaning clear. 

4. Hierarchical Modeling. Our actors under observation are assumed to be a 
subpopulation of a bigger population. That is, we observe actors {1, . . . , n} that are 
sampled from a population for a longitudinal study. We are not privy to the actors' 
latent features that determine the frequency of pairwise messaging activities, but we 
do observe messaging activities T> t — {(te,ie,je) ■ t( < t}. A notional illustration of 
our approach thus far is summarized in Figure 4.1, Figure 4.2, and Figure 4.3. In 
both Figure 4.2 and Figure 4.3, t\ represents the (same) initial time when there was 
no cluster structure, and r 2 and T3 represent the emerging and fully developed latent 
position clusters which represent the object of our inference task. 

Population density process level. The message-generating actors are assumed to 
be members of a community, which we call the population. The aspect of the popu- 
lation that we model in this paper is its members' distribution over a latent space in 
which the proximity between a pair of actors determines the likelihood of the pair ex- 
changing messages. The population distribution is to be time-varying and a mixture 
of component distributions. 

The latent space is assumed to be R d for some d <G N, and the population distri- 
bution at time t is assumed to have a continuous density (if To be more precise, we 
assume that the (sample) path t — > m is such that for each y e R d , 




e 



where 
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Fig. 4.1: Hierarchical structure of the model. The top, middle and bottom layers 
correspond to the population level, the actor level and the messaging level, respec- 
tively. The top two levels, i.e., the population and the actor levels, are hidden and 
the bottom level, i.e., the messaging level, is observed. See also Figure 5.1 for a more 
detailed diagram. 



• qt £ is a smooth sample path of a stationary (potentially degenerate) diffusion 
process taking values in (0, 1), 

• 4> is a probability density function on R d with convex support with its mean 
vector being the zero vector and its covariance matrix being a positive definite 
(symmetric) matrix, 

• Ct.e is a smooth sample path of an Revalued (potentially non-stationary or 
degenerate) diffusion process, 

• at/ is a smooth sample path of a stationary (potentially degenerate) diffusion 
process taking values in (0, oo). 

Note that it is implicitly assumed that ~YliiQt,i = 1j an d additionally, we also assume 
that for each k = 1, . . . , d and m £ N, the m-th moment (x™, Mt) of the fc-th coordinate 
of fi t is finite, i.e., (x™,Mt) : = J x^fj, t (x)dx < oo. 

In this paper, we take q t j, ct,£ and ctt t i as exogenous modeling elements. However, 
for an example of a model with yet further hierarchical structure, one could take a 
cue from a continuous time version of the classic "co-integration" theory, e.g., see 
Comte [1999]. The idea is that the location q of the £-th center is non-stationary, 
but the inter-point distance between a combination of the centers is stationary. More 
specifically, one could further assume that there exist d x (d— do) matrix aj_, d x do 
matrix a and do x d matrix (3 such that 

• (a±) T a is the (d — do) x do dimensional zero matrix, 

• (aj_) T c t ,m is a (d — do) dimensional Brownian motion, 
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• P Ct,m is a stationary diffusion process. 
Thus, the position of centers are unpredictable, but the relative distance between each 
pair of centers are as predictable as that of a stationary process. 



Algorithm 1 Simulating a sample path of a population density process 
Require: ((q(t),c(t),a(t)) : t <G [0,T]) and At 

l: procedure PopulationProcess 

2: t <- 

3: while t<Tdo 

4: Vt{x) <~ E ra Qt,m<l>(x; Ct,m, &t,m) 

5: t^t + At 

6: end while 
7: end procedure 




(a) time r\ (b) time T2 (c) time T3 

Fig. 4.2: A notional depiction of the evolution of the full population and subpopulation 
latent position distributions. At each time 73, the lightly-colored outer histogram 
represents the latent position distribution fi t for the full population, and the darkly- 
colored inner histogram represents the distribution of the latent positions of actors 
under consideration. The illustrated temporal order is T\ < r 2 < 73. 



Actor position process level. Figure 4.2 sketches the connection between actors 
and populations. We first define a process for a single actor. To begin, for each t, 
given fi t and (w t , a t ) € (0, 1) x (0, 00), we write 

k=l K fei,fc a =X 1 2 



where 



b k t (x) =2{1-Ut)jil> ) (,/,, - .<•,,)//, (/yjr///. 

af(x) = (1 - uj t ) 2 J -0 (~^~ ) { '< ] ' " •''',-)(/// - )/'/(. .7 )'///• 
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Algorithm 2 Simulating a single actor's latent location process 
Require: At, ((uj t ,a t ) : t G [0,T]), and (p t : t G [0,T]) 

1: procedure LatentProcess 
2: Compute b t {x) and at(x) 

3: Compute the non-negative definite symmetric square root y/ a t (x) of a t (x) 

4: t <r- 

5: while t < T do 

6: AW(i) «- StandardNormal Vect or 

7: X(t + At) = X(t) + b t {X(t))At + y/a t (X(t))AW(t)VAl 

8: t^t + At 

9: end while 
10: end procedure 



The formulation here for the b\ and a\ l is based on a quadratic Taylor-series approx- 
imation of a so-called "bounded confidence" model studied in Gomez-Serrano et al. 
[2012]. Here, the value of oj t ,i represents the confidence level of actor i on its current 
position and a t ,i represents the visibility of other actors' position by actor i. Roughly 
speaking, an actor with a small value of oj t ,i and a large value of a t ,i will be influenced 
greatly by actors that are positioned both near and far in the latent space whereas 
an actor with a large value of u>t,i and a small value of otj. will be influenced only a 
small amount by actors that are nearby in the latent space. For further discussion on 
our motivation for the form of At, see Appendix A. 

For each actor i, the deterministic path t — > (u}t,i,&t,i) 1S assumed to be contin- 
uous, taking values in a compact subset of (0, 1) x (0, oo). It is assumed that given 
t — > fit, each actor's latent position process Xi — {X t .i : t G [0,T]) is a diffusion pro- 
cess whose generator is A t , and moreover we assume that Xi , . . . , X n are mutually 
independent. For each t, let 

Xt = {X tj i, . . . , X t ,n) T , 

where each X tji is assumed to be a column vector, i.e., adxl matrix. In other words, 
the «-th row of X t is the transpose of X t ^. 

Messaging process level. Denote by N t ^j the number of messages sent from 
actor i to actor j. Also, denote by N t ^j the number of messages exchanged between 
actor i and actor j. Note that N t .ij — N tl i->j + Nj^^t). For each actor i, we assume 
that the path t — > X t ^ is deterministic, continuous and takes values in (0, oo). For 
each t, we assume that 

P[JW,i->j = N t ^j + l\F t ,X s ,s<t} = (\ t ,i\tj/2)Pt,i-+j(X t )dt + o(dt). 

For our algorithm development and Experiment 1 in Section 6, we take 

p t ,i->j(x) := pt.i^j(xi) := P[X tJ = Xi\F t ], (4.1) 

but for Experiment 2 in Section 6, we take Pt,i->j {x) = cxp( — \\xi~ x } :\\ 2 ). Next, by way 
of assumption, for each pair, say, actor i and actor j, we eliminate the possibility that 
both actor i and actor j send messages concurrently to each other. More specifically, 
we assume that 

V[N l3 (t + dt) = N l3 {t) + l\Tt,X^,X SJ ,s< t] (4.2) 
= (Xi^Xj^Wipt^jiXtJ + Pt ^i(X td ))dt + o(dt). (4.3) 
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For future reference, we let 



^t,ij = M,i^t,j(pt,i,Pt,j) ■ 



(4.4) 
(4.5) 



Algorithm 3 Simulating messaging activities during a near-infinitesimally-small 
time interval 

Require: t € R+, At e K+ and {{T tJ {t), Ay(*)) eM}:l<!<j<n} 



2 
3 
4 
5 
6 
7 
8 
9 
10 
11 
12 
13 
14 



procedure MessagingActivities 
£•*- 1 

for % •<— 1, . . . , n — 1 do 
for j (i + 1), . . . , n do 
^(t + At)*-!^*) 
if Tij(t + At) < then 
Messages^] <- (t,i,j) 
Tij (t + At) UnitExponentialVariable 
£<-£+l 
end if 
end for 
end for 
t^t + At 
end procedure 




(a) time n 



(b) time T2 



(c) time T3 



Fig. 4.3: One simulation's Kullback-Leibler divergence of posteriors at times n < 
^2 < 73. The horizontal (a;) and vertical (y) values, ranging in 1,2, ...,30, label 
actors. The more red the cell at (x, y) is, the more similar vertex y is to vertex x. 
The dissimilarity measure (KL) clearly indicates the emergence of vertex clustering. 



5. Algorithm for computing posterior processes. We denote by pt the 
conditional distribution of X(t) given Tt, i.e., for each B E B(%), 



Pt {B)=¥[X t eB\T t }. 



(5.1) 



For the rest of this paper, we shall assume that the (random) measure pt{dx) is 
absolutely continuous with respect to Lebesgue measure with its density denoted by 
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Pt(x). That is, pt{B) — J lB(x)pt(x)dx for B e B(X). Denote by p t ,i the i-th 
marginal posterior distribution of p t , i.e., for each B e p t i (B) — P[X t i € 

B \Tt], and let Pt,i{x) denote its density. 

5.1. Theoretical background. In Theorem 5.1, the exact formula for updating 
the posterior is presented, and in Theorem 5.2, our working formula used in our 
numerical experiments is given. We develop our theory for the case where u>t and at 
are the same for all actors for simplicity, as generalization to the case of each actor 
having different values for oj t ,i and a t ,i is straightforward but requires some additional 
notational complexity. 

Theorem 5.1. For each f e C b (K) and t e (0, oo), 

dp t (f) = Pt (Atf) dt + 1 T Qf Pt(dx)f(x) (\ t (x) - 11 T ) * dM t j 1, 

where Xt{x) = {^t,ij{x))i'j=i is an n x n matrix such that for each i j, X t _ij(x) = 
Pt,j(xi)/{pt,i,Pt,j) and for each i = j, \ t ,ij(x) = 1, and dM t = (dM ttij )? j=1 is an 
nx n matrix such that for each pair i ^ j, dM t ^ = dN t ^j — \,ijdt and for each pair 
i = j, dMt tij =0. 

Hereafter, for developing algorithms further for efficient computations, we make 
the assumption that for each t, 

Pt,ijk = Pt,iPt,jPt,k, (5.2) 

where pt,ijk denotes the joint density for actors i, j and k. 
Theorem 5.2. For each function f e C (K d ), we have 

dptAj) - PtAAf)dt + ]T f ( /' PMft f - PtAf)) (dN t ,ij - \ijdt) . 

j^i ^ \Pt,iiPt,j) J 

Replacing / with a Dirac delta generalized function, Theorem 5.2 states that for each 

dp t ,i(x) = A* t ptAx)dt + p t Ax) ( Pt ' j ^ X \ - 1 j (dN t ,ij - X t ,ijdt) , 

where A* denotes the formal adjoint operator of A. For use only within Algorithm 4, 
PdeTerMj j = AtPt.i{x)dt, and (5.3) 
JumpTerm m = p t Ax) {^~~) ~ : ) ( dNt >v ~ X ^J dt ) ■ ( 5 - 4 ) 

5.2. A mixture projection approach. The projection filter is an algorithm 
which provides an approximation to the conditional distribution of the latent process 
in a systematic way, the method being based on the differential geometric approach to 
statistics, cf. Bain and Crisan [2009]. When the space on which we project is a mixture 
family, as in Brigo [2011], the projection filter is equivalent to an approximate filtering 
via the Galerkin method, cf. Gunthcr ct al. [1997]. Following this idea, starting from 
Theorem 5.2, we obtain below in Theorem 5.3 the basic formula for our approximate 
filtering algorithm. 
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Algorithm 4 Updating the posterior distribution of actors' latent position over a 
near-infinitesimally- small time interval 

Require: t G R + , At e R + , ^ t and {(^, u^, u^) : ^ £ (t, t + At], 1 < ug < vg < n} 

procedure EstimateActorPosterior 
for !<-l,...,ndo 

Compute Pt+AtA from p ti using PdeTerm^ 
end for 

I <r- arg min m t m 
te-i <- t 

while t t e (t, t + At] do 

for i<- l,...,n- 1 do 

for j <— (i + 1), . . . , n do 
if {i,j} = {u e ,v e } then 

dJV t)ij = 1 
else 

dN ttij = 
end if 

Update pt+Ats using (pt, m )m=i and PdeTerm^j in (5.3) 
Update pt+At.j using (pt, m )" l=1 an d JuMPTERM tJ in (5.4) 

end for 
end for 
end while 
end procedure 



To be more specific, consider a set of probability density functions S = {4>e}- 
Then, let S < Mi{R d ) be the space of all probability density functions that can be 
written as a probability- weighted sum of (f>\, 4>2, • • • • That is, / € <S if and only if 
f(x) = J2i wi(l>i{x) for some probability vector (wi,w 2 , • • • ) T on indices 1, 2, • • • . In 
particular, for deriving our algorithms, we will assume that for some systematic choice 
of <S>, each probability density under consideration is a member of S. 

Among many possible choices for {<pk} in Theorem 5.3 are a multivariate Haar 
wavelet basis and a multivariate Daubechies basis. On the other hand, a Gaussian 
mixture model is pervasively used throughout statistical inference tasks such as clus- 
tering and classification in algorithms such as fc-means clustering. As such, we develop 
our algorithms with an eye towards use with other Gaussian mixture model-based al- 
gorithms. In Appendix B, we further develop our algorithm under the assumption 
that 

Pt,i( x ) = W t,i,^( x ' 0*> s )' 

where (p is the standard Gaussian density function defined on R d , s e R + , and the 
finite sequence {9g} C R d is to be chosen judiciously prior to implementing the algo- 
rithm. 

Preparing for our next result in Theorem 5.3, we let P be the symmetric matrix 
such that Pfe 1 ,fe 2 = (4>k 1 ,<f>k 2 )i an d for each k, let Sk be the symmetric matrix such 
that its (r, c)-entry Sk, rc is (<fik, <f>r<t>c)- Collectively, we denote by S the three-way 
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tensor whose (k,r,c) entry is Sk, rc - Let Rt,i,e be the matrix such that its (r, c)-entry 
Rt,i,e,rc is (■At,i,e<fir, ( l ) c), where A t .i,i is the differential operator such that 

K—l fci,K2 — 1 

with 

& M,£<» = 2 (! -^t,i) J i)(a~l(y-x))(y-x)k<(>(y,ct,at,e)dy, (5.6) 
a M^ 2 (' T ) = (1 — ^t,i) 2 ^ ^(^/(J/ - ^)) (V - x )kAy - x) k2 (j)(y;ct,at,e)dy. (5.7) 

Theorem 5.3. Suppose that for each t, i and x, pt,i(x) — X^fcLi W / t,i,fc0fc( a; )- 
W t; i denote the column vector whose k-th entry is Wt,i,k- Then, 

PdW t ,i ^^2q t ,iRt,i,tW t ,idt 

£ 

+ E w T -PW ' PWt l ( dNt ' ij ~ ^AtjW^PWtjdt) . (5.8) 



5.3. Algorithm for continuous embeddings. 

5.3.1. Classical multidimensional scaling. In our application, our final anal- 
ysis is completed by clustering the posterior distributions. Instead of working directly 
with posteriors, an infinite-dimensional object, we propose to work with objects in an 
Euclidean space each of which represents a particular actor. However, given p t ^ and 
Pt,j, using their mean vectors or their KL distance for clustering can be uninformative. 
For example, ii pt,i — Pt,j, then their mean vectors would be the same and their KL 
distance would be zero. However, if p t s — ptj is the density of, say, a normal random 
vector such that its mean is zero and its covariance matrix is vl for a large value of 
v, then concluding that actor i and actor j are similar could be misleading. 

To alleviate such situations in a clustering step of our numerical experiments, 
we propose using a multivariate statistical technique called classical multidimensional 
scaling (CMDS) to obtain a lower dimensional representation of p t = (p*,i)™=i- More 
specifically, we achieve this by representing each actor as a point in R d , where the 
configuration is obtained by solving the optimization problem 

min ^ElH^-^ll - 9((Pt,i,Pt,j)))\ 2 , (5.9) 

where g denotes a strictly decreasing function defined on [0, oo) taking values in 
R + . For example, one can take g(u) = cos -1 (urn) where uj e R + is chosen so that 
uiu e [0,7r/2] for all possible values u of (pt,i,Pt,j)- Another possibility among many 
others is to choose g(u) = — log(wu) if lj is chosen so that uju e (0, 1] for all possible 
values u of (p t ,i,Pt,j)- 

We denote by £(p) the set of solutions to the optimization problem (5.9). Given 
a vector p of n probability densities on R d , it can be shown that the solution set £(p) 
is not empty and is closed under orthogonal transformations. 
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In the classical embedding literature, ensuring continuous embcddings is neglected 
as it is not relevant to their applications. However, for our work, this is crucial as 
we study their evolution through time, i.e., ideally, we would like to see that a small 
change in time corresponds to a small change in latent location. In this section, we 
propose an extension to the CMDS algorithm to remedy the aforementioned non- 
uniqueness issue, and show that the resulting algorithm ensures continuity of embed- 
dings. 

In our numerical experiments, for each p, we choose a particular element £*(p) 
of the solution set £(p) so that depends on p in a consistent manner. 



Algorithm 5 Compute a unique CMDS embedding of M by minimizing the distance 
from a reference configuration Z with full column rank 

Require: a matrix Z e M nxc ; with full column rank and a symmetric matrix M £ 
M+ Xn such that diag(M) = 
l: B 4— any n x d classical MDS solution of M 
2: Compute a singular value decomposition UDV T of Z T B 
3: Return BVU T 



5.3.2. Continuous selection. By a dissimilarity matrix, we shall mean a real 
symmetric non-negative matrix whose diagonal entries are all zeros. First, fix d such 
that 1 < d < n. Then, for each n x n dissimilarity matrix M, define 

g{M) = -\(l- H T /n)M( 2 )(I - ll T /n), 
£(M) = argmin||e(M) - XX T \\ 2 F , 

where = (M?-). The elements of £} d (M) are known as classical multidimensional 

scalings, and as discussed in Borg and Groenen [2005], it is well known that C d (M) 
is not empty provided that the rank of g(M) is at least d. Our discussion in this 
section concerns making a selection £^(M) from A{M) so that the map M — > <Q(M) 
is continuous over the set of dissimilarity matrices such that g(M) is of rank at least 
d. 

Let M be a dissimilarity matrix such that the rank of g(M) is at least d. We 
begin by choosing an element of Q(M), say ^(M), through classical dimensional 
scaling. Let UT,U T be the eigenvalue decomposition of £d(M), where UU T — I and 
E is the diagonal matrix whose entries are the eigenvalues in non-increasing order. 
By the rank condition, we have En > . . . > E^d > 0. First, we formally write 

X+ = U+^, (5.10) 

where 

(i) C/+ is the n x d matrix with its ij entry Uij, 

(ii) E + is the d x d diagonal matrix whose i-th diagonal entry is Ejj. 
Dependence of X + on M will be suppressed in our notation unless needed for clarity. 
Now, if the diagonals of E + are distinct, then X + is well defined. However, in general, 
due to potential geometric multiplicity of an eigenvalue, our definition of X + can be 
ambiguous. This is the main challenge in making a continuous selection and we resolve 
this issue in our following discussion. 
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For our remaining discussion, without loss of generality, we may assume that for 
each dissimilarity matrix M, X + is well-defined by making an arbitrary choice if there 
is more than one CMDS solution. Note that the mapping M — > X + (M) may not be 
a continuous selection. We now remedy this. First, fix an n x d matrix Z and let 

UM) = {X+Q : QQ T = /} C ^(M), 

where Q runs over all d x d real orthogonal matrices. Then, define 

&(M)= argmin||A-Z|| 2 F . (5.11) 

xei d (M) 

Algorithm 5 yields the solution Q(M) and the proof of the following theorem, 
Theorem 5.4, can be found in Appendix E. 

Theorem 5.4. Suppose that the n x d matrix Z is of full column rank. Then, 
the mapping M — > Q(M) yields a well-defined continuous function on the space of 
dissimilarity matrices such that g(M) is of rank at least d. 




<■! ■ At ► '■! ■ >Ai 



Ht+At ► ^i+2At 

N t ► N t+At : N t+2At 



Fig. 5.1: A graphical representation of the dependence structure in the simulation 
experiment. The nodes that originate the dashed lines correspond to cither one of 
constant model parameters ((go^o), ( <T 0:<^o) ; and Ao) or exogenous modeling cle- 
ment (c t ). The arrows are associated with influence relationships, e.g, fit — > Ht+At 
reads fj, t influences fit+At- 



5.4. Technical observations. Here we discuss some insightful facts related to 
our model given the assumption stated in the last section. First, we have the following: 

Theorem 5.5. Fix t > and suppose that ji t {x) > for a.e. x g R d . The 
operator At is elliptic, i.e., for each x g R d , the matrix a t (x) = (af 1,fe2 (x)) is positive 
definite. 
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Proof. Note that for each z eR d , 

z T a t (x)z = (1 - cj t ) 2 J ip(a^ 1 {y - x)) (z T (y - x)(y - x) T z) fi t (y)dy 
= (l-w t f| i/>(ff t _1 (|/-2;)) \(y-x) T z\ 2 nt{y)dy. 

Note that tp(a^ 1 (y - x)) > and /j, t (y) > for each y e R d , and that |(j/ - x) T z| 2 > 

away from a subspace of R d whose Lebesgue measure is zero. It follows that 
z T a t {x)z > for each x G M d , whence a t (x) is positive definite. □ 

Now, we further examine Algorithm 1, Algorithm 2, Algorithm 3, Algorithm 4, 
Algorithm 5, and discuss some technical points behinds these algorithms. 

In Algorithm 2, existence and uniqueness of \J a t (x) follows from Theorem 5.5. 
The continuity of x — > y/a t (x) follows from Theorem 5.5 and Strook [2008]. In 
Algorithm 3, for simulating a sample path of t — > N t .ij, we use the so-called time- 
change property; that is, we use the fact that the process given by t — > ^ is a unit- 
rate simple Poisson process, where A t ^j — J * X u ^jdu and A^ := inf{u > : A. u ,ij > 
t} and N^ ij := iV u ij| A -i . For simulation, we use its dual result, i.e., t — > N t .ij ■= 

N* ti j\ u _ A is a point process whose intensity process is \t,ij, where u -4- N* A j is 
a path of a unit-rate simple Poisson process. Also, we note that for computation 
of Xt ij, online inference is a necessary part of our simulation in Algorithm 3; that 
is, we need to compute p t s^j(x) — P[X t .j — x]^]. In Algorithm 3 and Algorithm 
4, near-infinitesimally- small means At so small that the likelihood of having more 
than one event during a time interval of length At is practically negligible. Also, 
by StandardNormal Vector in Algorithm 2 and UnitExponential Variable in 
Algorithm 3, we mean generating, respectively, a single normal random vector with 
its mean vector being zero and its covariance matrix being the identity matrix, and a 
single exponential random variable whose mean is one. 

6. Simulation experiments. In our experiments, we hope to detect clusters 
with accuracy and speed similar to that possible if the latent positions X(t) were 
actually observed even though we use only information in p t = estimated 
from information contained in Ft- We denote the end-time for our simulation as T. 
There are two simulation experiments presented in this section, and the computing 
environment used in each experiment is reported at the end of this section. 

Experiment 1. We take d = 1 and we assume that for each t £ [0, T] and actor 

1 = 1, . . . , 8, \ t s = 5, of j = 1/3 and oo t ,i = 0.1. For the population process we take 
for each t G [0, T] 

(H,i{x) = <t>{x;ct,at), 

where 

a t = 1/3, and 

!1 if t e [0,100 At), 
0.5 if t G [100 At, 250 At), 
if t G [250Af, 500At]. 
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Then, we also consider fit,n, where 

a t — 1/3, and 

'l if t e [O.lOOAi), 
c t ,ii=<0 if t e [100At,250At), 
1 if t e [250At,500Ai]. 

There is only one population density; in other words, = 1. Note that even 
with one population center, we can have more than one empirical mode for the sub- 
population. One of these modes is near zero, and another mode is near one. The 
reason for this is that because of the value of a 2 = 1/3 and of = 1/3, when an actor 
is too far away from the mode c t of the population process, the population process 
affects the actors on its tail only by negligibly small amount. In Figure 6.4 and Figure 
6.3, a sample path of the true latent position of each of eight actors is illustrated in 
black lines. It is apparent that in the ct,i, all eight actors are equally informed of the 
population mode shift, but in the c tsl case, only the last three were able to adapt 
to the change, and the first five actors are surprised by the abrupt change at time 
100A*. 

Our simulation is discretized. Our unit time is At = 0.05, and in Figure 6.4, each 
tick in the horizontal axis corresponds to an integral multiple of At. The jump term 
in our update formula is quite sensitive to the number of actors being considered. 
As such, for updating the jump term, we further discretized At into n 2 subintervals 
for numerical stability of our update iterations. For n = 8, each unit interval is 
associated with 64 sub-iterations, and the total number of the (main) iteration is 400, 
and we use (N t+ At,ij ~ N t ,ij)/n 2 instead of dN t+A te/n 2 ,ij - dN t+ At(e-i)/n 2 ,ij in each 
^-th subiteration of each main iteration staring at time t. 

To implement our mixture projection algorithm, we take s 2 = 1/2 12 = 1/4096. 
The initial position of the n = 8 actors are sampled from the initial population 
distribution fiQ. We take po,i(x) = 4>{x;Xq^,s). The discretized version R t of At is 
illustrated in Figure 6.1. For inference during our experiment, we have dropped the 
second order term and used only the first order term to keep the cost of running our 
experiment low. On the other hand, for simulating the actors' latent positions, we 
have used both the first and second order term of At- The value of P^ 1 R t W t ^dt gives 
the first part of the change in dW t ,i- Note that in both Figure 6.1a and Figure 6.1b, 
the entries that are sufficiently far off from the diagonals are near zero. 

For Ct = Ct,i, the time plot of the number of messages produced during interval 
[At£, At(£+1)] is given in Figure 6.2, and shows transient behaviors of varying degrees 
of messaging intensity over the interval. Our set up for ct = Ct,u produced a simulation 
sample output of observing 2.5 messages amongst the n — 8 actors in unit time once 
the population center changed abruptly from Ct = 1 to c t = at the start of the 
100-th unit time interval, i.e., t = 5. In other words, after t > 5, a single unit time 
is roughly associated with the amount of time during which the whole subpopulation 
of eight actors exchanges around 45 messages, or equivalently, during which each pair 
of actors exchange around 3 messages. On the other hand, in both Ct,i and ct,n for 
the interval [0, AtlOO] = [0, 5], the subpopulation messaging rate is relatively constant 
at the rate of 100 messages over each unit interval, and this is expected as all eight 
actors are tightly situated around 1. 

Our experiments for c t I and c tjII both show that the filtered positions for all 
eight actors are close to the exact positions. 
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(a) The population mean is at 1 (b) The population mean is at 



Fig. 6.1: The level plots of P _1 i?t for the discretized version R t of At, used in the 
simulation experiment for two particular cases, where the horizontal axis is associated 
with the rows of and the vertical axis is associated with the columns of P _1 i? t . 



Experiment 2. In this experiment, for each t, we have used the empirical distri- 
bution of 

-^t,n+l> • • ■ > ^t.n+L 

to obtain an estimate jut of \i t by partitioning the latent space into sufficiently small 
intervals, where we place a uniform kernel of height equal to the proportion of {X n+ i : 
i = 1,...,L} that lies in that interval. Our inference is on X t i, . . . ,X tn . Recall 
that n denotes the size of the subpopulation. The number n + L is the size of the 
full population. This set-up is closer to the motivation for our work, the bounded 
confidence model, Gomez-Serrano et al. [2012], and the connection with our model in 
this paper is made in Appendix A. In theory, the general setup in Experiment 1 is 
comparable to the setup in Experiment 2 when L in Experiment 2 is taken to be oo. 

We set L = 70, n = 30, A = .25, and lj = .2. We take the clustering based on 
Xt as the ground truth. Note that A here is comparable to <j t ,i in Experiment 1, 
or more generally in our model. We set up the simulation to observe roughly 3000 
messages amongst the n actors in unit time. This translates to 10 per actor per unit 
time. Note that this is a rough estimate as the messaging intensity is time-dependent 
and stochastic. In Figure 6.7, we have snapshots of X t — {X tj \, . . . ,X t>n ) and those 
of X t = {X t .i, . . . , X t , n ) for a single simulation run. Denote as the latency 

AC^C-C, 

where the dependency on our choice for a clustering algorithm is suppressed in our 
notation and for some e G (0, 1), 

C = inf{t e [0,T] : MARI(k(X s ), k(X t )) > 1 - e, for a.e. s G [t,T]}, 
( = inf{i G [0,T] : MARI(/t(V>*(p s )), k(X t )) >1-s, for a.e. s G [t,T]}, 

where MARI denotes a moving average of the Adjusted Rand Index (c.f. Rand [1971] 
and Hubert and Arabie [1985]) and we fix k to be a fc-means clustering algorithm for 
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100 200 300 400 500 

Time 

Fig. 6.2: The Ct = ct,i case. The number of messages per At across the time interval 
[100 At, 400 At] for the subpopulation of actors 1, . . . , 8. 



concreteness. We use the latency as a performance measure for a clustering algorithm 
k under our framework. For our projection, we use a Haar basis, i.e., a set of simple 
step functions, where the width of the intervals used in the experiment is ^. Also, 
unlike in Experiment 1, we take 

Pt,i->j(X t ,i) = exp(-||^M - ^jll 2 )- 

These changes require us to modify our algorithm slightly. However, the necessary 
modifications are straightforward, and we leave the details to the reader. 

It is important to note that we do not assume knowledge of the latent position of 
any individual, Xi(t); instead, we use only our knowledge of the overall population. 
As the number L gets larger, as shown in Gomez-Serrano et al. [2012], the dependence 
among 

At,i, . . . , X t . n+L 

diminishes, agreeing more closely with the model we specified in our framework. We 
investigate the behavior of our algorithm for small, medium and large values of L, 
showing robustness of our framework in the face of limited information. Recall that 
Figure 6.7 shows results for L = 70. Figure 6.5 compares the latency for L = 30 and 
L = 70. The clarity and accuracy of the clustering suffers with significant reductions 
in information used to estimate the priors fit- 
In Figure 6.7, we present snapshots of X t and X t = ip*(pt) for a single simu- 
lation run. Note that X t is a CMDS embedding of a dissimilarity matrix based on 




Fig. 6.3: The Ct = Ct.i case. The sample path of the true and estimated latent position 
of each of eight actors used for Experiment 1 in a black solid line and in a dashed red 
line, respectively. 



the posteriors p t . The colors denote the final cluster membership as determined from 
A:- means clustering with Xf. It is clear that the emerging cluster structure of the X t 
lags slightly behind that of X t in both accuracy and clarity; comparing the middle 
two figures, we can see that there are a few data points misclassified at time t-i- In- 
deed, Figure 6.6 shows that the clustering based on the embedded positions mirrors 
that possible with the true but unobserved latent positions with a small latency. 

Computing Environment. For Experiment 1, we used R 2.14.1 (64 bit) under 
Ubuntu 12.0.4 on an Intel Core i7 CPU 870 @ 2.93 GHz x 8 machine with 16 GB 
RAM. For a single run for 8, 16 and 32 actors, our experiment took 190, 788 and 
5384 seconds respectively. For Experiment 2, we used a Red Hat Linux cluster with 
24 nodes with 24 x 2.5 MHz CPUs and 132 GB memory each. Each Monte Carlo 
replicate took a single slot. A single replicate took approximately 3000 seconds. 

7. Conclusion and Future Work. We have described a strategy for clustering 
actors based on messaging activities. Our analysis is completed by clustering a CMDS 
embedding of posteriors. We have presented ways to simplify posterior analysis on 
two levels. The first level allows us to obtain an estimate of the posteriors in an 
online manner. The second level allows us to reduce our analysis to studying diffusion 
processes, which is often a starting point for addressing the optimal stopping problem. 

We have illustrated in our numerical experiments that the assumptions used to 
derive our two simplified approaches are mild enough to be useful for our inference 
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Time Time Time Time 



Fig. 6.4: The Ct = c^n case. The sample path of the true and estimated latent 
position of each of eight actors used for Experiment 1 respectively in a black solid line 
and in a dashed red line. 



task at hand, i.e., clustering. 

We believe that our framework has potential for tackling the problems faced by the 
social network practitioner regarding emergence of structure. We intend to develop a 
measure of confidence for our inferred latent positions. This will be crucial to many 
applications, as it will provide the decision-maker with information about whether to 
act or to wait for more data to increase the confidence in the inferred positions. A 
measure of confidence would therefore be a way to establish a stopping rule. Noting 
that we took the parameters of our model to be exogenous, we will need to explore 
robustness of our inference to incorrect parameter choices and then make explicit an 
algorithm for parameter estimation. Making our algorithm more scalable is also an 
area of our interest. These areas of future work will be key to applying our framework 
on substantial problems. 
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Appendix A. Motivation for the form of the differential operator At- 

A.l. Bounded confidence model: an adaptation. Our work in this pa- 
per is in part influenced by a so-called bounded confidence model in Gomez-Serrano 
et al. [2012] which focuses on establishing a propagation of chaos property of the in- 
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Fig. 6.5: Latency (AC) distribution for 200 Monte Carlo experiments. The translucent 
grey histogram is based on L = 30, and the cross-hatch shaded histogram is based 
on L = 70. The latency is defined as the difference between the time £ at which the 
moving average of the predictive ARI maintained a level of 1 — e for alH > ( and the 
time £ at which true locations' moving average of the ARI maintained a level of 1 — e 
for all t > £. The latency can be negative, but is generally small and positive. 



teracting particles model studied there. When denoting the actors' latent positions 
Xi(t),X2{t), ... € [0,1], in the bounded confidence model, the opportunities for (la- 
tent) position changes that each actor experiences is modeled as a simple Poisson 
process. When there is a change at time t, the change is assumed to involve precisely 
two actors, say, actor i and actor j, such that their position Xi(t—) and Xj(t—) differs 
by at most A. This yields an inhomogeneity in the rate at which actors change their 
locations. Then, the exact amount of change is specified by the following formula: 

Xi(t) = uXi(t-) + (1 - ^(t-), 
X j (t)=uX j (t-) + (l-u)X i (t-), 

where w £ (0, 1) is a fixed constant. Roughly speaking, upon interaction, actor i keeps 
w x 100 percent of its original position, and is allowed to be influenced by (1 — co) x 100 
percent of the original position of actor j, and vice versa. 

Fix constants A e (0, 1) and w € (0, 1). Then, define A by letting for each fj, and 

/, 

A(/M)f(x) = 2 f f(wx + (1 - u)y) - f(x)Mdy). 

J\x-y\<A 
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Fig. 6.6: Moving average of the /c-means clusterings of the embedded X t and X(t) 
against the fc-means clustering of Xt- Note that t 2 (cf. Figure 6.7) and ry* are nearly 
identical. 



Studied in Gomez-Serrano et al. [2012] particularly is the interaction between /j, t 
and X = (Xi, . . . ,X n ) where \it is the empirical distribution of X. As shown in 
Gomez-Serrano et al. [2012], the bounded confidence model has an appealing feature 
that the parameter space for the underlying parameters w and A can be partitioned 
according to the type of consensus that the population eventually reaches, namely, a 
total consensus and a partial consensus. In a total consensus regime, for sufficiently 
large i, everyone is expected to gather tightly around some fixed common point xq G 
[0, 1]. On the other hand, in a partial consensus regime, (depending on w and A), there 
is a finite collection of distinct values in [0, 1] separated by at least A, to exactly one 
of which each actor's position is attracted. In particular, the (asymptotic) position 
of actors yields a partition of the actor set when the exact locations of X±,X 2 , . . . 
are known. Generally, (/x t : t € [0, T]) is contracting toward for some closed convex 
non-empty disjoint subsets B\ and B 2 of [0, 1] in the sense that for some t € [0, T], 
Ha{B x U B 2 ) > fi t {Bi U B 2 ) for each s > t and Mt([0, 1]) = 1. 

In our adaptation, for analytic tractability, we replace the indicator function 
l|x— j/|<A with tp(z) = exp(— ^z T z), take Ht to be an exogenous modeling element, 
and take Wt to be potentially time dependent, yielding the operator 

A(fi)f(x) = 2j^{y-x) {f(ojx + (1 - u)y) - f(x)) n(y)dy. (A.l) 

The second numerical experiment in Section 6 focuses on the case where the 
community starts with no apparent clustering but as time passes, each actor becomes 
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Fig. 6.7: Xtj versus X tj i at times t± < ti < tq. The size of the population used to 
estimate the prior was L = 70. The first row shows the CMDS embedded positions 
(k = 2), and the second row shows the latent positions. Due to our CMDS embedding 
procedure (with rotation), the 1-dimensional embedding is the first coordinate of 
a 2-dimcnsional embedding. We show a 2-dimensional embedding for illustration 
purposes. 



a member of exactly one of clusters, where each cluster is uniquely identified by a 
closed convex subset of the latent space M. d . 



A. 2. A quadratic Taylor series approximation. In this work, we use a 
model that that captures the action in (A.l) up to the second order. To begin, note 
that 



f{x) = /Or) + Df(x) ■(z-x) + -(z~ x) T D 2 f(x)(z - x) + H.O.T., 



where Df(x) € K d and Df(x) G M^xd denote respectively the gradient and the 
Hessian of / at x, and H.O.T. denotes the higher order terms. Suppose that \Xt is 
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given. Now, we have 
Atf(x) := A(lH)f(x) 

= 2j^(y- x)Df(x) ■ (1 - u)(y - x)(, t (y)dy 

+ 2j^{y-x) Q(l - ujf{y - x) T D 2 f(x)(y - *)) » t (y)dy + H.O.T. 
= [Y,b k A^)d k f{x)+Y,Y, ak t lM ^ d lMf(A +H.O.T., 

\fe=l fei k 2 ) 

where b t {x) G M d and a t (x) e R dxd are given by the following: 
&t(z) = 2(1 -w t ) y ip{y - x)(y - x) k ii t {y)dy, 
a t lM (x) = (l-uj t ) 2 J yj(y - x)(y - x) kl (y - x) k2 fi t (y)dy. 

Dropping the term associated with H.O.T., we obtain the following: 

\ fe fei fe 2 / 

Appendix B. The mixture projection filter formula. 
B.l. Proof of Theorem 5.3. For each <j> r , we see that 

(<t> r ,dpt,i) = ^2{(t>r,4>c)dW t!itC = eJPdW tii . 

C 

We first consider the second term of the right side of (5.8). 

dH t t (x) := V ( M*)M*)>*m(x)+IhA*)>*J^(*)) pt i{x) ) dMt 

:= V ( ^)Mx)^ 3 (x)+PiA*)\^)) _ pt i{x) \ {m tf _ At ijdt) 

Now, we have that 

J 4> r (x){p tt i{x)\ t ,i^j{x) +p tij (x)X tJ ^i(x))dx 

= (A t ,iA t;J /2) J <j>r{x){pt,i{x)p t ,j{x) +p tij (x)pt,i(x))dx 
= (At,»At,j/2)2(0 r ,pt,»Pt,j) 

= \iX t ,j{4>r,Pt,iPt,j), 

and that 

if if 

At,« = A M A M ( PM ,p tJ ) = A M A tJ J] J] WftW&faM = Xt,iX t jW^PW t j. 

fei=i fe 2 =i 



On latent position inference from doubly stochastic messaging activities 



23 



Hence, 



= E w .pw e ^ PWt * {dNt ^ X ^W^PW t ^dt) . 

Next, for the first term of the right side of (5.8), we have 

((pr^AliPt.i) = ^qtA-At,i,e<l>r,Pt,i) 

e 

i c 

= E Qt,eJ~]{-A t ,i,i<l>r, <j>c)W t ,i,c 

I c 

= ^2q t jeJ 'Rt,i,iW t , 



HA- 



In summary, for each r, we have 



t t ^ f W^SrWt j T \ . T 

eJPdW t ,i = ejR t W til dt + £ ^ ,3 - ej PW t>i (aW M , - A,., A, , U , , /'II , ,<// j , 

&i \ w t.^ w t,j ) jj 

and our claim follows from this. 

B.2. Preliminary lemmas. This section contains two formulas to be used in 
the next section. Our result and proof in Lemma B.2 is stated in the same notation 
as in Lemma B.l. Recall that 7) oc (j)('j~ 1 (z — ■&)). 

Lemma B.l. Let C R d and {j e } c R+. Then, 



Ell^-^l 



-E 



7* 



Em 7" 



0* 



1 T (r* (e-diag(9)l T )) 1 

Z^n 7" 



where 6 is i/ie Gram matrix for (0 c) and V is the matrix whose (r, c)-entry is -f r 2 7 c 2 . 
Proof. Let C = X^77 2 anc ^ f° r eac ^ ^> Pi = li First, note that 

2 - ^)|f = E 77 2 0* T * - 2 * T ^ + *7^) 

= (l>7 2 ) ll-l| 2 -2^ T (E^) + E^7^ 



= E77 



2x t feptft) +iiE^n 2 -iiE^ii 2 ) + E^7^ 

V 1 ) 1 t ) t 



C \\x - Ptft - C\\ £ PtftW 2 + E 77 2 ^7^- 
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Now, 



ciiE^ii 2 -E^" 2 ^ 
^EE^^^-E^" 2 ^ 

r c r 

4 £ £ w*^c + E 7,- 2 (7,- 2 /c - 

= £ ( E E 7,- 2 7 c - 2 ^c + E ^ 2 (7,- 2 - C)Oj# r ) 

= ^ f E E 7,- 2 7 C - 2 ^c - E v 2 E -ar W) 

= ^"=5 fE E 7^ 2 7 C - 2 (^c - *J*r)) ■ 



Our claim follows from this. □ 



Lemma B.2. Let (p be the standard multivariate normal density defined 
Also, fix a sequence {lm}m=i c an< ^ a sequence {&t\m=i c ^ ' ■ 



\ d/2 

Em 7m EI 

11 T (r*(e-diag(6)l T ))l S 



• exp 



. 2 En In 



2 \ I \- X l^ 
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Proof. Using Lemma B.l, we see that 



[[(f>(x;$ m ,j m ) = Yl 



exp 



1 



exp f-oE 



a; — # r 



exp 



7r 

E 



7^ 



Em 7" 



E- 



• exp 



'll T (r*(e-diag(9)l T ))l' 



E„7« 



x ;E 



It 
Em 7" 



Me^~ 



-1/2N 



• exp 



'l 1 T (r * (9 - diag(9)l T )) 1* 



En 7" 



E,7,- 2 n,7, 2 



d/2 



x ;E 



-2 
7* 



Em 7" 



ME 



-1/2N 



7* 



exp - 



11 T (r*(9-diag(9)l T ))l^ 



Em 7" 



□ 



B.3. Formula for Rt.i,£ in a multivariate normal density case. Here, we as- 
sume, as done in Theorem 5.3, that fi t {y) = J2 e Qt,e<f>i(v) andp M (x) = J2e w t,i,i^e{y), 
where for simplicity, we have written <f>t(z) := (f)(z;9e,s) cx (f>(s^ 1 (z — 8 e j). In this 
section, we fix <j> to be the standard multivariate normal density defined on R d and 
recall that ip{z) — <j>(z) / <fi(0) . Also, we fix s e R + , and a sequence {6?} C R d . 

Lemma B.3. Fix e , x eR d and s e M+. For each k, 



2 

b^ e (x) = -2(1 - uH,i){x e ) k -^- 2 JpwvlWix; e , + a?.,) 1 / 2 ), (B.l) 

"t.i ' a t,e 



2 2 

(.T - 6 £ ) kl (x- 6 e ) k2 + l{fcl = fc 2 } ' ' 



(B.2) 
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Proof. Let 

2 2 

-2 , -2\ a t,i a t,£ 

v t ,i,e = l/(er M +a M ) = 2 2 , 

°t,i ~r « t £ 
2 2 
Ct,»,^ = -o . H o 1 2~^> 

*t,i + a M CT t,i + a t,l 
where to simplify the expression of Ct^t, we have used the fact that 

a t,i a t,i x (°t,i + a t,e) = { a t,i + ay)- 

Also, note that 

("m + a tl ) 1 lT ( r * ( - diag(e)l T )) 1 

- (^M 2 + « t j) _1 (- II* - fcf) 

= -Ki+«?,/)" 1 ik-<?/H 2 - 



0-7 fa 



t,e 

2 

— 2 —2 
.°t,i a t,* 



-2 



— 2 —2' 

a -4 

— 2 —2' 

— 2 —2 



-x T x 



9je e 
x T e e 



~6 e -0j9 e 



Using Lemma B.2 with x = y, i?i = x and d 2 = 9g, we see that 



cxp 



-^\\y - x\A cxp ( -\^h - eA /v^™?,/)" 



i i 

2 



2 \<V 2 



= (2™?,«)' 
= (2-1)^ 



27r/(27r) 



d/2 



( 



cxp 



V 



W.< a *.*K. ? + a J). 
e p v 2 k,+«m)J exp v 2 "*.m J 



cxp 



\\y—ct,i,i\ 

2v t ,t,t 



Then, for our claim in (B.l), it is enough to see that 



= c — X 

( a te a ti \ 

2 

a t,i la \ 
= 2 2 W ~ X > ■ 
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Next, we show our claim in (B.2). Hereafter, to ease our notation, we write c for c t ,i,e- 
First, for k\ ^ fc 2 , we have 



, \\y-4 2 \ 
cxp — yk 1 yk 2 dy 



f 1 / (y kl -c kl ) 2 \ f 1 / {y k2 - CfcJ 2 , 

= 7 exp (r^^H ' y 72^ exp \r^to^r 1 yfc ^ 

= Cfej Cfe 2 , 



and hence, 



CfciCfc 2 XfaCfa Cfc 1 Xfc 2 H-Xfc 1 Xfc 2 

(c-ar) fel (c-a;)fe 2 

O^ O^ 
2 ^ 2 _ 2 2 _ X )*2 

°t,» + a t,e 

2 

I (x-o e ) kl (x-e e )k 2 - 



On the other hand, for k\ — fc 2 , we have 

/* i / lly-c|| 2 \ 

1 expf-t^W 



/ 2 2 \ ^ 



2 2 



1 i 2 X k 2 i 2 ^-^ 



and so, we have 



/ (y2^F cxp (-^fr) (y " ' T)fei (y " xMy 

= «t,i,< + c kl c k2 - x fel c fc2 - c kl x k2 + x kl x k2 
= v t ,i.e + (c - a;)fe 1 (c - x) k2 

2 2 2 2 

a t,i a t,i a t.i , n \ °t,i n 

{6 e -x) kl ^ — ■ — 2~{0e-x)k 2 



9 , 9 1 9 i 9 V 9 . 9 



2 2 / 9 \ 



(x- 6i) kl (x- 6 e ) k2 . 



Our claim in (B.2) follows. □ 
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For Lemma B.4 and Lemma B.5, by Qi >r ,c, we denote the Gram matrix for 
(9(,9 r ,9 c ), and define T t ^ to be as in Lemma B.l for j 2 = a\ i + a 2 e , -f 2 = s 2 , 
73 = s 2 . Let 

Co = (6 e (al + al)- 1 + 9 r s~ 2 + 9 cS - 2 )/ ((a 2 4 + a 2 ,)" 1 + S ~ 2 + s~ 2 ) , 
C 1 1 1 = s 2 + 2a 2 1 + 2a 2 t/ 



c 2 = 



1/(2tt) 2 



d/2 



S * + 2s 2 (<j 2 t +a 2 ) 



't,i T 

To simplify our notation, we let 



1 1 



ox P ( -— i T (r tii * (e £ , r , c - dia g (e,, r , c )i T )) 1 



-( CT t,» + £ V + s ) 

5 = i T (r M * (e /ir , c - dia g (e,, r , c )i T )) 1. 





9e,k 




[-2 1 1] * 








_9 c ,k_ 





Define and note 



K 



fe=i 



[-2 1 t] * e^, r , c 1. 



Also, denote by ht >r ^ c {x) the multivariate normal density defined on M d such that its 
mean vector is C and its covariance matrix is Cf 1 I. For / e C(R) and fc = 1, . . . , d, 
we write 



(f(xk))e,r,c = / f{xk)he, r ,ci x ) dx i 



and note that in particular, 

(xk)e,r,c — Co,k, 

(#fc)<,r,c = Co,fc + 3C ,fcCf , 
(^I)«,r,c = Co,fe + 6Co )fc Cf 1 + 3C7 2 . 

Starting from (5.5), it is easy to see that 

d d d 

{A t ,i,t<t>r,<t>c) = ^2(bt,i !t d k (j> r ,(j) c ) + ^ ^ (a^ 2 d kl d k2 (f>r,<Pc), 

k=l fe 1= lfe 2 =l 

and as a matter of definition, we have 

Rt,i,t = {Rt,i,t,rc)r,c=l = (Mt,i,^r,0c»^c=l G 

Lemma B.4 and Lemma B.5 are associated, respectively, with the first and the 
second terms appearing in the right side of (B.3). 



(B.3) 
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Lemma B.4. For each £,r,c and i,t, we have 



fe=l 

(9 2 x d/2 (lzi*>M ( 2 °h + 2a h ,1 2 <* + 2 <i / 
[ ™ Mj + «?,/) \* 2 + Ki + 2 <e s2 (* 2 + 2 <i + 2 <e) 2 \ 

( 1/(2tt) 2 (l + a 2 e )s 2 ^ 

i i ex p i _ i 



+ S 2 (2(7 2 . + 2a y j — y 2s2 + 2a 2. + 2a 2 e 
Proof. To ease our notation, we first let 



t i<t (x) = -(2s 2 /C 2 ) ■ <f>(x; 9 e , (a 2 + a 2 t f/ 2 ){x 9 e ) k . 



It follows that 



(b^dk^c) (B.4) 

= -(2S 2 /C 2 ) . 1 {b k t ^d k 4> r ^c) 

-2(1 - L o t ,)^(2^l l Ya 2 J(al l + a 2 t i ) 

[S ~ /( ' 2) -(b*d k d> r ,cp c ). (B.5) 



—k 

We compute {b tie d k <f> r ,(f> c ) instead of directly working with (5.6). First, we observe 
that 

1 Ki + «?,/)* 2 



(4)t,r,c (Xk)i,r,c ~ Cl ~ S 2 + 2a 2, + ^2^ 



and that 



1 

( (<?li + a le) s2 



s 2 + 2al i + 2alj (a 2 l+ a 2 ^ 



( s 2 +2<7 2. + 2a ^ )2 - 
Using Lemma B.l on the third equality, we see that 



— k 

b t ,i,e(x)d k (j) r {x)(j) c (x)dx 



I 

= J -2s 2 /C 2 <f>(x;9 e , {a 2 tl + a 2 t t ) 1,2 ){x - 9 e ) k (~^(x ~ 9 r ) k (t> r {x) S j <j> c (x)dx 
= (2/C 2 ) J <p r (x)^ c {x)4>(x- B t , (ah + al e )^ 2 )(x - 9 e ) k (x - 9 r ) k dx 
= 2 J h e ^ c (x)(xl - x k (9 e + 9 r )k + 9t ik 9 r . tk )dx 

= 2 (i x l)t,r,c ~ {Xk)\ r ,c + {(X k )e, r ,c ~ Bt,k) ((Xk)t,r,c ~ 9 r .k)) ■ 



30 



N. H. Lee, J. Yoder, M. Tang and C. E. Priebe 



Continuing with the calculation, 

— k 

{\i,t d k<t>r,<l>c) 



2 U + <? 



+ 



k \ u t,i 



a 



tJ 



1 i (-20* + r + 6 c ) k (sH t - (al + ale + s 2 ) 6 r + (a 2 , + a^c) ; 



d C\ 

s 2 + 2a 2 , + 2a 2 £ 
2^ 2 ,i + 2a t 2 , 



+ 



( S 2 + 2( x 2 4 + 2a 2 £ ) 2 

c.2 



• 1 



- a 



t,e 







( 


0e,k 




* 















[8e,k, S r ,ki c ,k] 



1. 



(B.8) 

Putting together (B.6), (B.7), (B.8) and (B.5), and plugging in the full expression for 
Ci, we see that 



(1 - c^O^Ca / s 2 (2a 2 4 + 2a 2 ,) 2a 2 , + 2a 2 



^K, + «?,/) l s2 + 2(T M + 2a M ( 52 + 2 ^ + 2a 2 ,) 



Our claim follows after summing over fc and replacing C2 with its full expression. □ 
Lemma B.5. For each £,r,c, t and i, 

d d 

E EKii 2 ^^'^) 
fe 1= i fc 2 =i 

= 1(1 - c^) 2 ^ 2 ^ 2 f (4^A E [<^,r,c <**>V,c 1] 



-26, 



+ 



E 



( x k )i,r,c (#f)*,r,c (xf,)e. r ,c 
( X k)^,r,c (% k )t,r,c { x k)l,r,c 

(x 2 k )e,r,c (Xk)e,r,c 1 



-26». 



r,k 



-20/,* Ae ltk r , k -2^ fc (6- 2 fc - s 2 ) 
# 2 ,fc - 2 8r,k0l k (# 2 fe - s 2 )^ 2 , fc 



EE 

fel fc 2 #fcl 



1 

l/(2^) 2 



[(^fe 2 }*,r,c (Xk 2 )e,r,c 1] 
\ d / 2 



-(0/ + 6V) fel 

#£,fci#r,fci 



[l -{9i + 9 r ) k2 ^,fe 2 fl r ,fe 2 ] 



^4 + s2(2(7 2 j + 2a 2 £ ); 
Proo/. Note that 

a t y 2 (.) = (l- WM ) 2 (2 7 r ( x 2 J )' i / 2 



exp 



1 fo,i + <<<V r 

2s 2 + 2a 2 i + 2a 2 / 



/ 2 2 



fci,fc 2 
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where 

A k f 2 (x) = cp(x;6 e , (al + a t 2 ,) 1/2 )(z - 9 e ) kl (x - 6 e ) k2 , 
Bfy k >(x) = ( l>(x;6e,(*l i + al e ) 1 / 2 )l{k 1 =k 2 }. 
We first compute the diagonal terms, i.e., the k\ = k 2 cases. Note that 

EE/ B^(x)di lM M*)Mx)dx 

fel k 2 



fel k 2 J 
k 

= "7 E / ^ ^' (°*,< + a «) 1/2 )( a; fc - 2fl r,*a!fc + 9l k - s 2 )(j> r {x)<j> c {x)dx 
s k J 

= —± E / h e,r,c(x){xl - 1d T . k x k + 9 2 k - s 2 )dx. 



k 

and also that 



EE / i{*i = fe}^(4,^(*(^ 

fel fe2 

= E E / ^ ^ ( ct m + «y) 1/2 )!{ fc i = k 2}(* - ^/)fci o* - w^^m^m*)** 

fel fe)2 

= E / <^ + "m) 1/2 )(^ - 6 t )ldl k 4> r {x)4> c {x)dx 

k 

= ^ E / ^ <> ( CT M + "m) 1/2 )(^ - °e)U4 - ^r,kx k + e 2 , - s 2 )<}> r (x)<f>c(x)dx 

k 

= E / h ^rA x )( x t ~ 26e,kXk + Oe,k)i x l - 2e r.kX k + 6» 2 k - s 2 )cfo. 

fe 

Next, we compute the off-diagonal terms, i.e., the k\ ^ k 2 cases. First, using our 
calculation just above, we see that we note that 



EE/ A k t fi(x)dl lM Mx)Mx)dx 

fel fe2#fel 

= /E E / h e,rA x )( x ~ ^e) kl {x - O e )k 2 {x - r ) kl (x - 8 r ) k2 dx. 



fel k 2 =£k\ ' 

Our claim follows from this after combining them together, and simplifying the com- 
bined term into a matrix notation. □ 

Appendix C. Proof for Theorem 5.1. Here, we will take the convention 
that X t = (^t,ifc)™=i fc=1 is organized as a matrix. By the i-th row of X t , we mean 
Xt,i = (X t> i,...,X tt d)- Let 

tptiv) = P [e^< Xt > = / p t {dx)e< v > x \ 
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where for each v and x, 

n d 

(v,x) = ^2^2v lk x lk . 

1=1 fe=l 

In other words, ip t is the (random) conditional characteristic function of X t . Note 

Pt(y) = e-^v t (v)dv. 

Also, let, for each v and x e R nxd , 

a t (v\x) = lim -E\e l ^' Xt +^ x ^ - 1 \X t = x]. (C.l) 

For each / € B(K), f-i denotes the function obtained by fixing all other indices 
different from the z-th actor indices but letting the i-th actor indices to be free, and 
if f-i is in the domain of the operator A(pt), with some abuse of notation, we write: 

n 

Similarly, for each v € K d , let 



e .(t>,Jf 4 (t)> ijr 



In other words, ipt,i denotes the conditional characteristic function of the i-th row 
X k (t) of X(t), and also, let, for v e K d , and x e X, 

Ot(«|a;) = lim ^[e^^M+.-^u) -l\X ti = x]. (C.2) 

e->0 e 

Note that the definition of a t (v\x) is actually independent of a particular choice of 
vertex i as they are all identically distributed. 

One can prove the next result by directly following Snyder [1975], but one needs 
to adapt to the fact that the underlying process can now be a time-inhomogeneous 
non-linear Markov process. The proof details are left to the reader. For a survey of 
similar techniques, see also Kunita [1997] and Bain and Crisan [2009]. 

Proposition C.l. For each v e R nxd and t e (0, oo), 

diptiv) = (p u e l ^a t (v\-))dt + l T ( Pt7 e^ (A - ll T )')dM t l. 



Our proof of Theorem 5.1 is by brute force calculation, starting from Proposition 
C.l. In particular, our claim in Theorem 5.1 follows from Proposition C.l by directly 
applying Lemma C.2, Lemma C.3 and Lemma C.4 which we list and prove now. 

Lemma C.2. For each t e [0,oo), 



a t (v\x) = A(» t y {v '- X) (x). 
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Proof. Fix t, i, v and x. Then, for each e > 0, we have: 
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S 



e z{v,X iit+s -x) , Xti 



We have 



and hence, 



It follows that 



1+ / E 

/o 



A{nt+s)e^ v '-^ (X t+S!i )\X t!i =x 



sup 



< 


e i(v, — x) 







E A(fi t+s )e^- x Hx t+s )\X t = x <1 



lim- / E 

eiO e J a 



E 



A(jH+.)e t < v -- x >(X t+a )\X t = x 
A(pt)e< v '- x \x t )\X t = x 



ds 



= A{^ t )e^- x \x) 



ds. 



Lemma C.3. For each f e C b (K), we have: 
jj iy) {h J e ~ l(V ' V Hpt,e l(v ^a t [v\-))dv^j dy = J^Mpt) f) (z) Pt (dz). 



Proof. Fix y € X and note: 



2 ^ / e_l< " ,y> / Pt(dz)e^a t (v\z)dv 
= J p t (dz) (J- J e -<«.»> e '<«.*>o t (t>|z)cfo) , 



and that 



lje^-y>a t ( V \ z) d V =ll 



e z{v,z-y) Um _ E e4 <«,X t+e -z) _ ! |X, = z 

£^0 £ 



1 f , 1 

— / lim -E 

27r / e^o e 



; t(t>,X t+e -y) _ e »(t>,a-i/) \v _ 



t = Z 



dv 
dv. 
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Treating h(y) as a generalized function (i.e. a tempered distribution), we have: 
J h(y)f(y)dy 



i(v,X t+c -y) _ e i(v,z-y) \v _ z 



dv dy 



= j Pt{dz) (//(y) (^/lim^E 
= J Pt (dz) lim - £ (e J f(y) (±- J e^ x ^-yUv^j dy\X t = z -J f(y) (±- J 
= j pt(dz) lim ^ ^E J f(y)5 (X t+e -y)dy\X t =z - [j f(y)S (z - y)dy)) 
= J p t (dz) lim \ (E [f(X t+£ ) \X t = z\- f(z)) 
= J p t {dz)(A{ Pt )f)(z). 

□ 

Lemma C.4. For each / e C b (K), we have: 
fj{y) / e-<»'»><p t ,e^>->(A(.) - ll T ))rf^ dy = j^ Pt {dz)f{z) (\(z) 11 T ) j 

Proof. Note 

jf/(y) / e-*<»-«><p t ,e*<»-->(A(-) -ll 1 "))d«)dv 
= | p t (dz) (A(z) - 11 T ) (| /(y) (i- 1 e'<«.*-»>dt7) V 
= J Pt (dz) (A(z) - 11 T ) f(y)S (z - y)dy 
= J Pt {dz) (A(z) - 11 T ) . 



- y) dv J dy 



Appendix D. Proof of Theorem 5.2. Recall that for each / e 
denotes the function obtained by fixing all other indices different from the z-th actor 
indices but letting the i-th actor indices to be free. Fix u£ {1, . . . , n}. Let / e £>(X) 
be such that /(z) = f- u (z u ) for all z e X. For each t e (0, oo), 



d(pt,f) = (pt,u,A(p t )f-u)dt 



+ £ 

•<i,i^»,i^« 



Pt,u,i,j{dZu)dZi)dZj^f— u {^Zy^) 



(pt,i,Pt,j) 



- 1 dM 



t (dzi,dz u )f- u (z u ) 



ptA z i) 
(pt,i,pt,j) 



- 1 dM 



t,iu- 



Then, the claimed formula follows from our assumption in (5.2). 

Appendix E. Proof of Theorem 5.4. Suppose that M e — > Mo as e — > and 

that for each £ > 0, M £ satisfies the rank condition, i.e., q{M £ ) is of rank at least d. 
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Note that each ^(M £ ) is a non-empty compact subset of R nxd since ||X ei+ Q|||i = 
||X £j+ |||, for any real orthogonal matrix Q. In particular, for sufficiently small Eq, 
we may assume that sup e£ r £o i ||^(M £ )|||, < oo. It is enough to show that for each 
arbitrary convergent subsequence of {Q(M e )}, 

lim£(M e )=£(M ). (E.l) 

Consider an arbitrary convergent subsequence of {£j(M e )}. We begin by observing 
some linear algebraic facts. First, any sequence of real orthogonal matrices has a 
convergent subsequence whose limit is also real orthogonal. Next, since both X Et+ 
and Z are of rank d, there exists a unique real orthogonal dxd matrix Q £t + such that 

&{M e ) = X et+ Q e ,+ , 

and in fact, Q e ,+ — U £t+ Vj + where Xj + Z — U e ,+S e ,+Vj + is a singular value decom- 
position of Xj + Z, and U E . + V^ + is the corresponding unique right factor in the polar 
decomposition of Xj.Z. Note that this implies the well-definition part of our claim 
on Q. Also, since M e — > M , we have that 

d 

lim V |S e . 4 , - £ ,«| 2 < lim \\M e - M f F = 0. 

i=l 

For relevant linear algebra computation details for these facts, see Horn and Johnson 
[1985, pg. 69, pg. 370, pg. 412, and pg. 431]. 

Now, by taking a subsequence if necessary, we also have that for some nxd matrix 
U* such that UjU* = I, limbec U s>+ = £/*. Then, 

limAV = lim U E ^ E .+ 1/2 = £/*£ ,+ 1/2 = X,. 

Next, note that if £o,+ has distinct diagonal elements, then we also have {/* = ?7o,+ 
so that X* = Xq_ + . On the other hand, more generally, i.e., even when there are some 
repeated diagonal elements, we can find a d x d matrix such that X* = Xq. + Q*. 
To see this, note that the i-th column of is also an eigenvector of g(M ) for 
the eigenvalue Eo.+,ii: an d UjU* = I, and hence it follows that for some dxd 
real orthogonal matrix Qj, we have U*Qj = Uo,+ - Moreover, exploiting the block 
structure of £o.+ owing to algebraic multiplicity of eigenvalues, we can in fact choose 
Qj so that Q^+ 1/2 = S ,+ 1/2 Q*. Then, 

X, = U^ ,+ 1/2 = U,QjQ^ ,+ 1/2 = tV+£ ,+ 1/2 Q* = *o,+Q*. 
Now, we have 



&{M ) = X , + Q ,+ 
= X*Q~l Qo,+ 

= ]im(X e>+ Q e>+ Ql + )QjQ 0>+ 
= lwae d (M e )(]\mQl + )QjQ 0}+ 
= \imJ* d (M e )Q, 
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where Q = (lim £ ^ Ql +)QlQo,+ is a d x d real orthogonal matrix and and implicitly 
the limit was taken along a further subsequence when necessary. Moreover, 

\\\imC d {Me)-Z\\ 2 F > \\e d {Mo)-Z\\l 

= \\limti(M E )Q-Z\\% 

e— >0 

= lim\\Q(M e )Q - Z\\% 
> \im \\C d (M £ )-Z\\ 2 F 

e— ¥0 

= \\limQ(M e )-Z\\ F . 

In summary we have: 

\\limU(M e )-Z\\ 2 F = \\C d (M)-Zf F . 

By definition of £^(M ), along with the facts that (i) all of the convergent subsequences 
share the common limit, (ii) each subsequence has a convergent subsequence, and (iii) 
A ,+ and Z have of full column rank d, we have (E.l). 

References. 

L. Adamic and E. Adar. How to search a social network. Social Networks, 27:187- 
2003, 2005. 

P. K. Andersen, O. Borgan, R. Gill, and N. Kciding. Statistical Models Based on 
Counting Processes. Springer, 1995. 

A. Bain and D. Crisan. Fundamentals of Stochastic Filtering. Springer, 2009. 

I. Borg and P. J. F. Groenen. Modern Multidimensional Scaling: Theory and Appli- 
cations. Springer, 2005. 

P. Bremaud. Point processes and queues. Springer- Verlag, 1981. 

D. Brigo. The direct geometric structure on a manifold of probability densities 
with applications to filtering. 2011. URL http://arxiv.org/abs/llll.6801. 

E. Chi and T. Kolda. On Tensors, Sparsity, and Non-negative Factorizations. SIAM 
Journal on Matrix Analysis and Application, 33(4), 2012. 

M. De Choudhury, W. Mason, J. Hofman, and D. Watts. Inferring relevant social 
networks from interpersonal communication. In In Proc. 19th Intl Conf. World 
Wide Web, New York, pages 301 — 310. Association for Computing Machinery, 2010. 

F. Comte. Discrete and continuous time cointegration. Journal of Econometrics, 
(88):207-226, 1999. 

C. Cortes, D. Pregibon, and C. Volinsky. Computational methods for dynamic graphs. 
Journal of Computational and Graphical Statistics, 12:950-970, 2003. 

J. -P. Eckmann, E. Moses, and D. Sergi. Entropy of dialogues creates coherent struc- 
ture in e-mail traffic. Proceedings of the National Academy of Sciences of the United 
States of America, 101:14333-14337, 2004. 

J. Gomez-Serrano, C. Graham, and J.-Y. Le Beudec. The Bounded Confidence Model 
of Opinion Dynamics. Mathematical Models and Methods in Applied Sciences, 22, 
2012. 

J. Gunthcr, R. Beard, J. Wilson, T. Oliphant, and W. Stirling. Fast Nonlinear Fil- 
tering via Galerkin's Method. In Proceedings of the American Control Conference, 
1997. 

N. Heard, D. Weston, K. Platanioti, and D. Hand. Bayesian anomaly detection 
methods for social networks. Ann. Appl. Statist., 4:645-662, 2010. 



On latent position inference from doubly stochastic messaging activities 



37 



R. Horn and C. Johnson. Matrix analysis. Cambridge, 1985. 

L. Hubert and P. Arabie. Comparing partitions. Journal of the Classification, 1985. 
H. Kunita. Stochastic flows and stochastic differential equations. Cambridge Univer- 
sity Press, 1997. 

J. D. Lee and M. Maggioni. Multiscale Analysis of Time Series of Graphs. In 
Proc. SampTA, 2011. 

N. H. Lee and C. E. Priebe. A Latent Process Model for Time Series of Attributed 
Random Graphs. Statistical Inference for Stochastic Processes, 14(3):231-253, Oc- 
tober 2011. 

P. Perry and P. Wolfe. Point process modelling for directed interaction net- 
works. Journal of the Royal Statistical Society, Series B, 2013. URL 
http : //arxiv . org/abs/1011 . 1703. 

W. Rand. Objective Criteria for the Evaluation of Clustering Methods. Journal of 
the American Statistical Association, 1971. 

J. Ranola, S. Ahn, M. Sehl, D. Smith, and K. Lange. A Poisson model for random 
multigraphs. Bioinformatics, 26, 2010. 

D. Snyder. Random point processes. John Wiley & Sons Inc, 1975. 

A. Stomakhin, M. Short, and A. Bertozzi. Reconstruction of missing data in social 
networks based on temporal patterns of interactions. Inverse Problems, 2011. 

D. W. Strook. Partial Differential Equations for Probablist. Cambridge University 
Press, 2008. 

M. Tang, Y. Park, N. H. Lee, and C. E. Priebe. Attribute Fusion in a Latent Process 
Model for Time Series of Graphs. IEEE Transactions on Signal Processing, 61(7): 
1721-1732, 2013. 



